# Gulf of Mexico SST MNAR Benchmark Dataset

# Overview

# Daily sea surface temperature (SST) observations from NASA Aqua MODIS

# Level-3 Mapped (L3m) satellite over the Louisiana coastal Gulf of Mexico

# (2022-2024), paired with NDBC buoy ground-truth records from four

# in-situ stations.

# Cloud cover renders a large fraction of infrared satellite pixels

# unobserved on any given day. This dataset preserves those gaps as NaN

# rather than imputing them -- the missingness structure is the subject

# of analysis and is not altered in any file.

# 

# Key Statistics

# 

# Sensor: NASA Aqua MODIS L3m (4km daytime SST)

# Spatial domain: 28.43N - 30.71N, 95.58W - 88.63W

# Temporal coverage: January 1, 2022 - December 31, 2024 (956 days)

# Total grid cells: 6,974,694

# Cloud-masked cells: 61.21% (preserved as NaN)

# MNAR bias: 1.5 degrees C mean SST difference between observed and cloud-masked regions

# Buoy records: \~541,000 hourly in-situ observations across 4 NDBC stations

# 

# 

# Files

# Satellite SST

# gulf\_sst\_3yr\_2022\_2024.csv

# Daily Aqua MODIS SST consolidated across 956 days.

# Columns: Lat, Lon, Date, SST

# NaN in SST column = cloud-masked pixel (MNAR gap)

# Buoy Ground Truth (NDBC)

# buoy\_station\_42001\_2022\_2024.csv  Station 42001, hourly in-situ SST 2022-2024

# buoy\_station\_42035\_2022\_2024.csv  Station 42035, hourly in-situ SST 2022-2024

# buoy\_station\_42036\_2022\_2024.csv  Station 42036, hourly in-situ SST 2022-2024

# buoy\_station\_42040\_2022\_2024.csv  Station 42040, hourly in-situ SST 2022-2024

# buoy\_combined\_all\_stations.csv    All four stations merged (\~541,000 records)

# Code

# download\_ocean\_nasa.py

# Downloads Aqua MODIS L3m NetCDF files from NASA EarthData and

# converts them to daily CSVs clipped to the Louisiana bounding box.

# process\_aqua\_modis\_l3.py

# Processes locally stored Aqua MODIS L3m NetCDF files into

# bounding-box-clipped CSVs. Use this if raw .nc files are available.

# analyze\_all\_years.py

# Computes per-year and aggregate missingness statistics across

# the full 956-day dataset. Reproduces the audit results in the paper.

# 

# Data Source

# Raw satellite data: NASA Ocean Biology Processing Group (OBPG)

# https://oceandata.sci.gsfc.nasa.gov/

# Buoy data: NOAA National Data Buoy Center (NDBC)

# https://www.ndbc.noaa.gov/

# Processed tensors, cloud mask structure, daily statistics, and

benchmark splits are original contributions of this work.




===

# Contact

# Pujit Naga Sai Pavan Kumar Etha

# Louisiana Tech University

pavaneatha@live.com


===

